A New Prediction Oriented Barrier Synchronization on SMP Clusters
نویسندگان
چکیده
Clusters of Symmetric Multiprocessors (CSMP) are becoming an increasingly popular high-performance computing platform due to the commodity availability of multiprocessor nodes, mature SMP operating systems, low-latency, highbandwidth data networks, and superior price-performance ratio. Fast synchronization is crucial to making efficient use of SMP clusters. In this paper, we focus on one kind of synchronization in parallel computing, SMP cluster barrier operations (SCB). Typically, parallel programs either for shared-memory (SMP) or for CSMPs alternate between computation and communication supersteps. Thus, the processors synchronize between each step, arriving in close temporal proximity at the barriers. This observation encouraged us to design a new barrier algorithm with faster execution than traditional approaches while preserving scalability and portability. Our new CSMP barrier algorithm PSCB consists of two levels, one for each SMP called the node barrier (NB), and the other for the overall cluster called SMP cluster barrier (MPB). We improve the SCB by using predictive techniques on each SMP node to overlap the NB and MPB barrier levels. This method provides an efficient technique to overlap different communications involved on either side of the barrier. Our work includes an experimental study of these barrier operations on several parallel architectures. These results support our claim of a faster barrier synchronization. KeywordsSMP, hybrid SMP clusters, barrier, prediction.
منابع مشابه
Efficient Barrier Using Remote Memory Operations on VIA-Based Clusters
Most high performance scientific applications require efficient support for collective communication. Point-to-point message-passing communication in current generation clusters are based on Send/Recv communication model. Collective communication operations built on top of such point-to-point message-passing operations might achieve suboptimal performance. VIA and the emerging InfiniBand archit...
متن کاملMultiple Networks for Heterogeneous Distributed Applications PDPTA’07
We have experienced in our distributed applications that the network is the main limiting factor for performances on clusters. Indeed clusters are cheap and it is easier to add more nodes to extend the computing capacity than to switch to costly high performance networks. Consequently the developer should especially take care of communications and synchronizations in its application design. The...
متن کاملThe Florida State University College of Arts
Clusters of Symmetric Multiprocessing (SMP) nodes with multi-core Chip Multiprocessors (CMP), also known as SMP-CMP clusters, are ubiquitous today. Message Passing Interface (MPI) is the de facto standard for developing message passing applications for such clusters. Most modern SMP-CMP clusters support Remote Direct Memory Access (RDMA), which allows for flexible and efficient communication sc...
متن کاملUnifying Barrier and Point-to-Point Synchronization in OpenMP with Phasers
OpenMP is a widely used standard for parallel programing on a broad range of SMP systems. In the OpenMP programming model, synchronization points are specified by implicit or explicit barrier operations. However, certain classes of computations such as stencil algorithms need to specify synchronization only among particular tasks/threads so as to support pipeline parallelism with better synchro...
متن کاملRun-Time Support for Multi-tier Programming of Block-Structured Applications on SMP Clusters
We present a small set of programming abstractions to simplify eecient implementations for block-structured scientiic calculations on SMP clusters. We have implemented these abstractions in KeLP 2.0, a C++ class library. KeLP 2.0 provides hierarchical SMPD control ow to manage two levels of parallelism and locality. Additionally, to tolerate slow inter-node communication costs, KeLP 2.0 combine...
متن کامل